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Abstract —This article introduces the Stochastic Texture Dif¬ 
ference method for analyzing data at prescribed spatial and 
value scales. This method relies on constrained random walks 
around each pixel, describing how nearby image values typically 
evolve on each side of this pixel. Textures are represented as 
probability distributions of such random walks, so a texture 
difference operator is statistically defined as a distance between 
these distributions in a suitable reproducing kernel Hilbert space. 
The method is thus not limited to scalar pixel values: any data 
type for which a kernel is available may be considered, from 
color triplets and multispectral vector data to strings, graphs, 
and more. By adjusting the size of the neighborhoods that are 
compared, the method is implicitly scale-dependent. It is also 
able to focus on either small changes or large gradients. We 
demonstrate how it can be used to infer spatial and data value 
characteristic scales in measured signals and natural images. 

I. Introduction 

A. What this work is about 

In this paper, we present a local algorithm, a low-level 
and scale-dependent data analysis that is able to identify 
characteristic scales in data sets, for example (but not limited 
to) images. The main idea is to statistically characterize how 
data values evolve when exploring each side of a given pixel, 
then compare the statistical properties of each side. We exploit 
random walks on images but in a different setting than m. 
Our method resembles the graph walks in (2), except that we 
additionally introduce a statistical comparison operator, that 
our method is adapted to images, directional, accounts for 
neighborhood information and is intrinsically scale-dependent. 
Indeed, exploring the neighborhood of each pixel implies 
setting two scales: the spatial extent of that exploration, and 
the sensitivity to pixel value differences along each explored 
path. Once these are set, we quantify the difference between 
each side, and repeat the operation in other orientations around 
a pixel. This gives, for each pixel and each prescribed set of 
(spatial, data) scales, a measure of how that pixel is a transition 
between distinct statistical properties on each of its sides. 
Depending on how these scales are set, we may for example 
get a new kind of low-level boundary detector, sensitive to 
either small data changes or large gradients, and at different 
resolutions. The method can be used to infer characteristic 
scales in natural data, these at which the boundaries capture 
most of the information. The method is not restricted to 
images, it is readily extendable to voxels or other spatially 
extended data (e.g. irregular meshes). It is not restricted either 
to scalar (e.g. gray) valued pixels: any data type for which a 
reproducing kernel is available can be used (e.g. vector, strings, 
graphs). We present in particular the case for color triplets in 
Section EH 


B. What this work is not about 

Our goal is to provide a local algorithm, that works at 
pixel level, and not to perform global image segmentation. The 
best algorithms for segmentation account for texture informa¬ 
tion O, exploit eigen-decompositions for identifying similar 
regions m, or improve their results with edge completion 
o. These high-level operations are very efficient at finding 
transitions between general zones in the image 0, but we 
want a local quantifier that can be used to infer characteristic 
scales within acquired data, which is a different problem from 
image segmentation. It may be the case that our methodology 
becomes useful in a segmentation context, but this topic is out 
of the scope of the present paper. 

Similarly, our goal is not to perform high-level texture 
classification m, but to formulate a new way to represent 
textures and their differences (Section [HI]). As a quantifier for 
texture differences, our method might provide future works 
with a new way to compare an unknown sample with a ref¬ 
erence database. However, it remains to be seen how efficient 
that would be compared to well-established alternatives. In 
particular, we do not decompose data on a basis of functions, 
such as wavelets 0 , curvelets j9|, bandelets ins and other 
variants HD. Neither do we adapt the decomposition basis 
using a dictionary fl2l or define elementary textons lfl3l for 
description and classification purposes. We compare distri¬ 
bution of probabilities, which we identify with the textures. 
It may be that our method ultimately helps in defining new 
optimal decomposition of images on some basis of elementary 
textures, but that is another issue, which falls out of the scope 
of this article. 

C. Comparison with related approaches 

Multi-scale analysis can either be performed by exploiting 
relations and patterns across scales, or by defining features 
per scale, and then investigating how these evolve as the scale 
varies. The first category includes all wavelet-based methods 
mentioned above, as well as more advanced techniques like 
multifractal analysis fl4l . fT5l . By definition, methods in this 
first family do not allow finding characteristic scales, they 
exploit relations between scales, for example to build texture 
descriptors. The second category, a scale-dependent analysis 
which is then repeated at multiple scales, is prominently 
represented in the field of image processing by space-scale 
decompositions m. This way, edges are found at different 
levels of details fTTI . hopefully allowing the user to remove all 
extraneous changes and focus on important object boundaries 
EU. Any scale dependent analysis might be used in this 
second family, and in particular pixel connectivity, which is 
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sometimes used to obtain multiscale descriptions, see llT9l for 
a review. Our method falls in the second category, but with 
a completely new feature involving sets of random paths. It 
allows to specify both the spatial scale and the data scale for 
the analysis, and to look for optima in this two-dimentional 
parameter space which are presumably characteristic of the 
data, see Section [V| 

We then use statistics to quantify how data values evolve 
in a pixel neighborhood at these prescribed spatial and data 
scales. l20l also uses statistics to compare data distributions at 
multiple scales, but with a different framework and for clas¬ 
sification purposes. We use random walks in order to explore 
the neighborhoods. Although we use a Markov Random Field 
(MRF) approach in order to fix the random walk transition 
probabilities, our approach is fundamentally different from the 
typical use of MRF in image processing (see ETI for a review). 
We use the MRF solely as a intermediate step to calibrate the 
random walks with a spatial scale, and not as a model for pixel 
values, or for segmentation El, 122j. Anisotropic diffusion 
methods, from the Perona-Malik model l23l to space-scale 
extensions with partial differential equations f24l . might seem 
related to this random walk approach. Indeed, gaussian filter¬ 
ing could be seen as integrating pixel values along random 
paths in the neighborhoods we use. However, we do not 
integrate along random paths, but we rather exploit how pixel 
values evolve along these paths in order to define a statistical 
distribution of these variations in the neighborhood, which we 
then identify with texture information (Section |III-B }. Our 
method can thus be seen as complementary to diffusion. A 
notable multiscale extension of anisotropic diffusion to vector 
data can be found in f25l . and it would be interesting to 
compare how introducing a reproducing kernel in their setup 
complements our own technique. 

Our method is applicable to non-scalar data, including non¬ 
vector data, by means of reproducing kernels. Yet, we do not 
use kernel methods as in machine learning of texture features 
(26), (3, but as an indirect way to quantify discrepancies 
between probability distributions, building on recent findings 
in statistics G2, [2]|]]. Consequently, for the particular case of 
kernels on color spaces which we demonstrate in Section [VI| 
our method is fundamentally different from current approaches 
in the domain f29l . Our hope is that this ability to work with 
non-scalar data will make our method a prominent tool for the 
analysis of multispectral and other kind of images. 

The stochastic texture discrepancy (STD) which we present 
can also be seen as a kind of “texture gradient” (see Section 
|V-Q . Unlike related work, which use wavelet transforms [30 j, 
morphological operations ED, or combinations of both f32l 
to define texture gradients, the STD is a norm in a suitable 
reproducing kernel Hilbert space. Consequently, we expect that 
it could be also incorporated in classic gradient-based edge- 
detection algorithms l33l . or any other setup in which a proper 
norm is required. 

Section II presents the theory, starting with how texture is 
encoded and Section III how texture information is compared 
on each side of a pixel. Section IV gives a first application 
of the method, how it can detect texture boundaries accouting 
for scale information. Section V builds on this and presents 


a fully automated method for detecting characteristic scales 
in acquired data (including, as in this paper, images). Section 
VI demonstrates that our method deals equally well with non¬ 
scalar data, including, in that application case, color triplets. 
Color texture difference is shown to better detect boundaries 
and highlight small details than alternatives. 

II. Encoding texture information 

In the following, bold mathematical symbols denote pixels, 
plain symbols are values associated to these pixels. Capital 
mathematical symbols (resp. bold, plain) represent sets (of 
resp. pixels, values). Mathematical spaces are calligraphied. 

A. Overview 

Consider a pixel o. An orthonormal basis (x,y) in the 
image plane being fixed (for instance the one aligned with the 
boundaries of the image), we consider the affine basis (o, x, y) 
with the origin at that pixel center. We want to compare 
how the texture on the half-plane {x > 0} is statistically 
different from the texture on the other side {x < 0} (Fig.0. 
A length scale is needed to characterize the extent of the 
texture comparison within each half-plane. Let A be this spatial 
length scale. Noting iV ± (o) neighborhoods of pixels within 
the two half-planes, consider the probability densities defined 
on N ± (o) by p±{x,y) = C-exp (^~ x ^2 ) for ±x > 0 and 
0 otherwise, where C is a normalizing constant. The texture 
characterization we propose amounts to: 

A: monitoring pixel values taken along n independent 
random paths in each of these neighborhoods N ± (o ); 

B: in a way that the probability to visit a pixel a G N ± (o) 
is given by p ± (a) = JJ x yea p ± (x,y)dxdy; 

C: and that the average spatial extent of a random path is 

A. 

As a result of this process, two sets S ± (o) are obtained 
for each pixel o along each pair of opposite direction^] 
These sets each contain n random sequences of pixel values 
(v ( o,i )) i=1 m , with v(a) the image information (gray scale, 
RGB, etc.) at pixel a. A set of paths thus contains the statistical 
description of a texture on the corresponding side of a pixel 
o, at a prescribed characteristic spatial scale of A. 

Each of the above points is detailed separately in the 
following subsections. 

B. Generating random paths 

Each pixel location around o is considered to be a distinct 
state of a Markov chain. Transitions are allowed to the 4 
neighborhood pixels, together with self-transitions. A random 
pixel in a± G N ± (o) is chosen as the starting point, with a 
probability given by the limit distribution p ± (ai). 

The Markov chain is then run for m steps for generating 
the pixel sequence (a*) i=1 m . The values of these pixels are 

Pixels along the boundary line may appear in each set, as shown in Fig. [T] 
In this vertical line example pixels fall on either side with equal probability, 
but the method is applicable for lines at any angle, in which case p~ (a) / 
p+(a), both being determined by the area of the pixel falling on either side 
of the line as specified by p± (a) = ff x yea p± (x, y)dxdy. 








3 


Left side N~(o) s Right side N*(o) 







1 

1 

1 











1 

1 











1 

1 

1 






















f 

Rar 

ide 

>m 

pa 

th 



Pi 

xel 

O 

l 

:f> 

I 









J 

f 

A 









1 

ir 


—■ 








h 

F 

a vg e 

;xte 

nt= 

A 







j , 

also 

= de 

v N 

Ho) 








Pat 

h le 

ngt 

1 = 

m 



Figure 1. General method for generating stochastic texture descriptions. The case for a left/right split is presented, but other orientations with any angle are 
processed in exactly the same way. Notations and a step by step description are given in the main text. 


then taken along this path, so that s = (si = v (a,i )) i=1 m G 
S' ± (o) is a length m excerpt of the texture in N ± (o). The 
process is repeated n times to get the full set S ± (o ), in 
each direction, which is assumed to encode the full texture 
information provided n is large enough (Fig. [lj. 

C. Building the Markov chain 

For each pixel a, hence each state of the Markov chain, 
we first compute p ± (a). This is the probability of reaching 
this state in the limit distribution of the Markov chain. We 
now need to define a set of transitions that leads to this limit 
distribution. 

Letp ± (a -A b ) be the transition probability from state/pixel 
a to state/pixel b. There are at most 5 non-null such probability 
transitions, from a to its neighborhood pixels hi ...4 and to 
itself 65 = a. The Markov chain is consistent with the limit 
distribution when: 

5 

P ± ( a ) - '52p ± ( b i)P ± ( b i a ) (!) 

i= 1 

while respecting the constraint 

5 

J2p ± (a^ b i) = 1 ( 2 ) 

i=1 

The unknowns in this problem are the transition probabili¬ 
ties for every pair a, b. We start by assigning initial transition 
probabilities pf ni (a -» b t ) = p ± (b i )/ Yn=iP ± ( b i)- This, of 
course, does not provide the correct transition probabilities, 
just a starting point. We then run an iterative Levenberg- 
Marquardt optimisation scheme in order to solv<^] EqjT] and 

Eq@ 

2 We use the solver 03 and reach a typical sum of squared error the order 
of 10 -30 for these conditions for both equations, with a maximum of 10 -14 
for the smallest neighborhoods. We nevertheless renormalize condition Eq@ 
to ensure a strict equality. 


D. Ensuring sequences are consistent with the spatial scale 

For each sequence length m, we note the average extent 
e(x) in x for generated sequences of this size: e{x) = 
(maxj< m (x (a*)) - (x (a*))}, with x (a) denoting 

the x coordinate of the center^ of pixel a. Averaging e(x) 
is done numerically over a large number of samples. We then 
select the value of m that gives a sequence of spatial extent 
e{x) = A on average. The typical error on A is less than 0.01 
pixel for the neighborhoods we use. 

Building the Markov chain and defining m needs to be 
done only once for each neighborhood size A and for each 
considered orientation of the neighborhoods N ± (o). Distinct 
orientations lead to distinct p ± (a) = ff x ea p ± (x, y)dxdy 
for every pixel a around o. We limit ourselves in this paper 
to straight and diagonal orientations, so the densities p ± (a) 
present a convenient symmetry in each of these two cases, but 
any angle is possible. Once these are precomputed, the same 
(straight, diagonal) Markov chains can be applied to each pixel 
o in the image (boundary conditions are dealt with in Section 
iHTDl ). 

III. Comparing textures 

The procedure described in the previous section is ap¬ 
plied to each pixel on an image, leading to 4 pairs of 
sets *S ± (o) for each pixel o, corresponding to both straight 
and diagonal directions. Each set comprises n sequences 
s = (si = v (ai)) i _ 1 m of length m, with m chosen so that 
sequences extend spatially to a characteristic scale A in the 
corresponding direction on average. This section details how 
to quantify the difference d (S~ , 5 f+ ) between two sets 5' ± (o), 
hence between textures on opposite sides of a pixel. 

A. Comparing two sequences of pixel values 

Let s = (si = v (oi)) i=1 m be a sequence as defined 
above. Let V be the space taken by pixel values v(a) G V, so 
that s G V m . Consider a kernel k : V m x V m -A M, such that: 

3 Or, equivalently, the mean of x on the surface of pixel a in this case with 
centered pixels 
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- k (s, •) is a function, in the Hilbert space H of functions 
from V m to M. 

- For any function / E H and for all s E V m , the 
reproducing property holds: (f,k(s,-)) n = f(s ) with (•, -) n 
the inner product in H. 

- Span{/c(s, -)} seV m is dense in 1~L 

As a special case, the kernel value k(s~,s + ) = 

(k (s - , •), k (s + , -)) n gives the similarity between s~ and s + . 
As is customary, we normalize k such that k(s,s) = 1 and 
k (s, t / s) < 1 for all s, t E V m . 

This classic reproducing kernel Hilbert space (RKHS) rep¬ 
resentation lets us consider any pixel value: scalar (e.g. gray 
scale, reflectance), vector (e.g. RGB, hyperspectral compo¬ 
nents), or in fact any kind of data for which a kernel is 
available (e.g. graph, strings). Indeed, once a kernel is avail¬ 
able for an element E V, it is always possible to use the 
product kernel for the sequence s G V m , hence to compute 
the similarity between two sequences. The method we present 
is thus generic, and applicable to wide range of “images”, 
broadly defined as spatially extended data. In particular, one 
way to handle color textures is presented in Section [VI] 
Similarly, definitions in Section [II] are easily extended to 
more than 2 spatial dimensions if needed (e.g. Markov chains 
transitions to the 6 voxel neighbors in 3 dimensions), or even 
to non-regular meshes (by computing p ± (a) for each mesh 
cell and transitions as above), or graphs (provided a meaninful 
p ± (a) can be provided for each node a). 


B. Quantifying texture differences 

The random sequences defined above can be seen as samples 
of an underlying probability distribution over V m , which we 
identify with the texture information. It is possible that visually 
distinct patterns lead to the same distribution over V m , as 
we offer no uniqueness proof, and especially with n < 00. 
However, the same visual ambiguity exists for other and 
widely used texture representation methods, for example when 
retaining only a small number of components in a suitable 
decomposition basis (GO, Go), El). We make no claim on the 
superiority of our method in this respect, but it certainly offers 
a very different characterization of textures, complementary to 
other approaches. 

Let us thus identify a texture as a probability distribution 
P(s) over V m . In this view, the sets S ± (o) are collec¬ 
tions of observed samples for the textures in the neighbor¬ 
hoods N ± (o). Quantifying the texture difference d(S~,S + ) 
amounts to performing a two-sample test for similar distribu¬ 
tions, with similarity defined by a metric on probability dis¬ 
tributions. Using the above RKHS representation, we exploit 
recent litterature m, lf28l to propose a consistent estimator 
for d(S~,S + ) that works reliably even for small number of 
samples n, irrespectively of the dimension of V and V m : 

- For a distribution P, the average map /ip E M is given 
by p P = E P [fc(s,-)]. 

- An estimator fis is easily obtained from the sample set 
S, in the form ji s = ^ Y^= 1 & (<s J , •), where s J denotes the 
j -th member of the set S. 


- Let P and Q be two distributions over V m . If the kernel is 
characteristic, then P = Q if and only i^]|| /ip — /iq \\ n = 0. If 
P / Q, the norm directly quantifies the discrepancy between 
these distributions. 

Thus, the texture difference d(S~,S + ) can be estimated 
with: 


d 2 {S-,S+) = \\fis + -^s-\\n 0 ) 

= (A S+ - Ps~ > Ps+ - P'S- )u 

n n 

= ^J2'E( k (s+,s k f)+k(s j _,s k f-2k (A,s-)) 
3=1 k=1 

Thanks to being a consistent estimator [28] for a norm in a 
suitable space, the operation we propose behaves very much 
like a pixel-based gradient norm, but on textures differences 
instead of pixel values, realizing of form of “texture gradient” 
(See Section [WC]). 

We define an overall Stochastic Texture Discrepancy 
STD(o) for each pixel o by considering the norm in the 
product space 0 q 1~Lq over all directions 0 , normalized by 
the number of directions, such that 

STD(o) 2 =j^(,SpS+) (4) 

with 6 indexing in this paper the horizontal, vertical and the 
two diagonal pairs of sets. 

The whole procedure is easily generalizable to higher di¬ 
mensions (e.g. voxels), to arbitrary angles 0 (by computing 
the Markov chain for the limit distribution of the rotated 
neighborhood), to anisotropic neighborhoods, etc. 


C. Choosing an appropriate data scale 

In the results presented in the next sections, we use the 
inverse quadratic kernel for scalar valued pixels: 


/ 1 m 

k(s,t) = 1/ n— (Sip - upf 

\ i =1 


( 5 ) 


This kernel is characteristic ED, and we found that it 
performs as well as the Gaussian kernel, but it is much faster to 
compute. We have normalized the sum of squares by m, so that 
straight and diagonal computations have comparable kernel 
values. The parameter s sets the scale at which the interesting 
dynamics occur in the observed signal values, and should be 
set a priori by the user or a posteriori in order to optimize some 
objective criterion. In Section IV-B we use the reconstruction 
Peak Signal to Noise Ratio (PSNR) as a measure of accuracy 
for each value of A and s. Other applications might have 
different criteria, for example a classification rate in a machine 
learning scenario. The important point is that the optimal 
spatial and data scales should be set a posteriori with such 
criteria, as it is not guaranteed that maximal PSNR (i.e. 
minimal squared reconstruction error) matches the optimal 


technically, this is only true up to an arbitrary set with null probability. 
But such sets cannot be observed in practice in sampled data, hence they can 
be safely ignored. 
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criteria for every application (e.g. classification). Moreover, 
in most images, some spatial variations for the optimal n 
are expected, since not every part of the same image reflect 
the same object. Hopefully, if so desired, our method can be 
applied on arbitrarily shaped sub-parts of such images simply 
by masking out other parts as missing data, as detailed in the 
next section. 

D. Handling missing data and image boundaries 

Some images, like the satellite monitoring of sea surface 
temperature in Fig. [4j present missing data for some pixels, 
in this case corresponding to land masses. In that case, entries 
in some sequences {si} i<m may be missing. We deal with 
the issue by modifying the kernel to maintain a normalized 
average over the remaining entries in these sequences: 

k(s,t) = 1/ ^1 + |k y^(si//t-^//t) 2 j ( 6 ) 

where C is the set of common indices where both se¬ 
quences have valid entries. When C = 0, the kernel value 
cannot be computed. In that case, we average over remaining 
kernel evaluations, separately in each term of Eq. [3] When 
d(S~,S + ) cannot be evaluated in some directions S ± , then 
we average over remaining directions in Eq. [4] to define the 
texture discrepancy at that pixel. When all directions are 
invalid, for example in zones of missing data larger than the 
neighborhood N ± (o ), we set the texture discrepancy to Not a 
Number, indicating missing informatioij^] When a large zone 
of missing data is present, then some directions have to be 
ignored and textures are compared in the remaining directions. 
This process adaptatively handles contours of missing data, 
discarding automatically directions toward missing data and 
considering only directions tangent to the large invalid zone. 

This effect is exploited to deal with pixels at the boundary of 
an image. Standard techniques for handling such boundaries 
involve extending border pixel values, mirroring, or setting 
outside pixel values to zero. However, these methods are only 
needed to work around the inability of an image processing 
algorithm to deal with missing data. Since we can, the natural 
choice for pixels outside the image is to consider them missing. 
Thanks to the above method, computations at an image border 
are performed in exactly the same way as for any pixel within 
the image. Textures to the very border of the image are taken 
into account and, importantly, without introducing spurious 
discrepancies (such as setting outside pixels to null would) or 
spurious similarities (such as mirroring or extending border 
pixel values would). 

IV. A SCALE-DEPENDENT TEXTURE DISCREPANCY 
DETECTOR 

A. Demonstration on a synthetic test picture 

The synthetic image of Fig. [2] left was designed to demon¬ 
strate the behavior of the algorithm on elementary patterns 

5 STD values for missing data are also produced within the range of the 
neighborhood of a valid data point, albeit with reduced precision as the 
distance from that point increases. In practice, we ignore these STD values 
and retain only points where the original data was defined. 


such as one-pixel width lines and chess boards. Labels in Fig. [2] 
(middle, right) are added for clarity of the discussion. A: points 
within thin (isolated) 1-pixel lines have the same neighborhood 
on each side of the line. Points next to these lines have a 
maximal difference between neighborhoods at 1-pixel scale, 
resulting in detected edges on each side of the line. At scale 
2, points two pixels away also have asymmetric neighborhood, 
although less than the nearby pixels, hence localization of 
the maxima is preserved between scales A = 1 and A = 2 
(for small n values, see Section III-C ). B: sharp black/white 
transitions result in two-pixel wide boundaries, one pixel on 
each side of the transition, with weaker contributions further 
away at scale A = 2. C: lines separated by 1 pixel are fully 
recognized as being part of the same texture at 1-pixel scale, 
hence disappear in the stochastic texture difference (STD) 
maps at scales A = 1 and A = 2 . These STD values are 
nevertheless not null, unlike that of the uniform background, 
hence these patterns are grayed out instead of blanked out. 
Boundaries around these texture regions are detected, as in the 
isolated line case. D: chessboard patterns are also recognized 
at both scales, together with surrounding edges. The difference 
between the chessboard pattern and the spaced-out squares 
below is weakly detected in both cases, while the black/white 
inversion within the checkboard pattern is only visible at 
A = 1. E: this repetitive pattern is better recognized at spatial 
scale A = 2 and pixel value scale n = 0.25 (the average value 
over a repeating block), but its elements are too isolated at 
scale A = 1. The boundaries of that texture are also found at 
scale A = 2. Pixels in the region below are too far apart for 
A < 2. F: thin lines separated by 2 pixels also become part of 
the same texture with a surrounding edge at A = 2, while lines 
separated by 3 pixels are isolated in both cases. G: The spiral 
center is a non-repeating pattern of lines 1 pixel apart, which 
can be recognized at scale A = 2 (see the black branches 
within the spiral), while the spiral branches 3 pixels away are 
still isolated. H: the thin diagonal lines behave similarly than 
vertical and horizontal lines. Lines too close at either scale are 
merged into a unified texture with a surrounding edge. 

This example clearly demonstrates the effect on changing 
the spatial scale A. The next section also shows the effect of 
changing the data scale k, on a real image presenting both low 
and large gray scale contrasts. 


B. Typical behavior of the algorithm on a real image 

The effect of analyzing the image at different spatial and 
data scales is demonstrated is Fig. [3] The classic “cameraman” 
picture is shown in Fig. [3^. This picture presents jpeg quan¬ 
tization artifacts in the sky region and around the coat (A), 
large contrasts for the coat, the tripod and the camera handle 
(B), and fine grass texture with intermediate contrasts (C). 
Buildings in the background form another medium contrast set 

(D) . At a very low k = 1/256 of one gray level quantization 
(Fig. [3])), the STD is sensitive to small changes at that level. 
Boundaries are detected on jpeg artifacts and within the coat 

(E) . On the other hand, all medium and large gray level 
gradients >> n are ignored, as can be seen on the coat edge, 
camera handle and tripod (F) that have completely disappeared 
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Figure 2. Reference black and white picture (left), stochastic texture differences at spatial scales of A = 1 (center) and A = 2 pixels (right), both with 
value scale k, = 0.25, averaged over 10 independent computations with n = 500. White stands for a null difference, black for maximum discrepancy. Capital 
letters markers are commented in the main text. 



Figure 3. Scale-dependent edge detection, a. Original 256 x 256 gray scale 
cameraman image, b. Analysis for A = 3, k = 1/256. c. Analysis for A = 1, 
k = 1. d. Analysis for A = 3, k = 1. Capital letters markers are commented 
in the main text. 


in this picture. The grass texture at intermediate gray level 
contrasts is also ignored. Conversely, setting k = 1 in Fig. (3}c 
and Fig. |3ji, results in edges that are sensitive to large gray 
level differences, with the reversed situation of ignoring small 
gray-level discrepancies. Buildings with intermediate contrast 
result in intermediate edge values (G). At A = 1 pixel in 
Fig. (3jc, edges are thin (H) but some jpeg quantization blocs 
are still faintly visible around the coat (I). This spatial scale 
is too low for matching the grass texture (J). With a larger 
A = 3 pixels in Fig. [3ji the edges become smoother (K), but 
jpeg artifacts are completely gone. Details of the texture finer 
than A = 3 are merged (L). 


The ability of an STD analysis to focus on specific scales 
could be very useful, for example, as a filtering guide for 
artifact removals (low k) or for implementing new edge 
detectors (large k). 


C. Use on physical measuments with characteristic scales 

The “cameraman” example demonstrates the ability of the 
algorithm to analyze and highlight image features at prescribed 
spatial and data scales. This is especially useful when dealing 
with physical data, for which processes occur at known scales, 
independently of the sampling resolution by which that process 
is measured (i.e. A is better expressed in real units rather 
than in pixels). As a demonstration, Fig. [4jeft shows remotely 
sensed sea surface temperature (s.s.t.), acquired in the infrared 
band with the MODIS instrument aboard the Terra and Aqua 
satellites. The covered region ranges from 15°E to 70°E in 
longitude and 0°S to 60°S in latitude. We use an 8-day 
composite image in order to remove the cloud cover. Missing 
data thus correspond to land masses, and is dealt with as 
described in Section |III-D| The floating-point values of the 
temperature are used as input, from -1.2°C to 31.5°C. The 
processes we are looking for are oceanic currents, which 
typically induce a temperature variation of at most a few 
degrees, so we take a characteristic data scale of k, = 1°C. 
Their width is variable, between 50-100km, with transitions 
zones to the surrounding water of a few km, so we take an 
a priori spatial scale of A = 75km. Fig. [4jight shows the 
result of the analysis of the sea surface temperature with these 
parameters, which clearly highlights the dynamics of the ocean 
at the prescribed scales. 


V. Multi-scale analysis 

A. Method 

The previous section presents the case where scales are 
chosen a priori, according to the user’s knowledge or objec¬ 
tives. But such information is not always available: in many 
cases, the goal is precisely to find what are the relevant 
scales to describe a data set. We thus need a methodology to 
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Figure 5. Classic reference images. From left to right, lena, barbara, house and sea star. We use the versions of lena, barbara and house from ED. The sea 
star picture is the training example 12003 from the Berkeley segmentation data set (6). We choosed it as it presents some background texture with higher 
intensity spots, some blurry and some sharp edges. 
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Figure 6. Influence of the two scale parameters A and k on the reconstruction accuracy, measured with the peak signal to noise ratio (PSNR) as defined 
in the main text, when retaining 20% of the points with higher STD. Averages of 5 independent computations with n = 100 were used for computing the 
PSNR. The minimum and maximum values for the PSNR are given in Table [T] for each image. Using these per-image bounds, blue is the lowest accuracy, 
red the highest. The spatial scale A is given in pixels, the data scale k, is given for [0..1]-normalized data values. An additional scale k x 256 is shown on 
the top to ease the interpretation of gray-scaled image results. Conversion factors for the sea surface temperature data are k « 0.03 for 1°C and A « 1 pixel 
for 8.46 km at the center of Fig. [4] 



Figure 4. Sea surface temperature. Left: MODIS data, dated 01/01/2006, 
covering 15°E to 70°E in longitude and 0°S to 60°S in latitude in 628 x 684 
pixels. The floating-point temperature is shown in color scale from blue (- 
1.2°C) to red (31.5°C). Missing values are shown in black. Right: Analysis 
with k = 1°C and A = 75km (at the center of the picture). 


identify which are the “best” scales in an image, with objective 
measurement criteria. 

One hypothesis is that the “best” scales should identify 
correctly the most relevant structures in a data set, which we 
identify with the structures carrying most of the information. 
Assuming this is the case, then we propose the following 
methodology: 

- Analyze the data for a given pair of scales A and k. 

- Select the 20% most discriminative points, taken to be 
pixels x with the largest d(x) STD values. 

- Reconstruct the image from only these points, according 
to the method detailed below. 

- Compare the reconstructed image with the original. 

If the features are correctly determined and they indeed 
carry most of the information on the image, then the recon¬ 
struction should be very close to the original. This procedure 
can also be seen as a lossy compression of an image: informa¬ 
tion in the discarded pixels is lost and these are reconstructed 





Table I 

Reconstruction accuracy when retaining 20% of the points 

AND BEST SCALES FOR THE TEST IMAGES. 


Image 

Best A 

Best k, 

min PSNR 

max PSNR 

cameraman 

1 

0.41 

12.0 

28.3 

house 

5.5 

0.64 

13.5 

26.2 

lena 

3 

0.094 

13.6 

21.0 

seastar 

3.5 

0.11 

13.7 

18.2 

s.s.t. 

7 

0.027 

10.8 

23.9 

barbara 

3.5 

0.64 

13.0 

17.7 


from the retained pixels. Perfect features holding all the 
information would mean perfect restoration of the image after 
decompression. We emphasize that our goal is not to obtain the 
largest PSNR from the minimum number of retained points, 
which is related topic adressed by other works in the litterature 
EU. Rather, we wish to use the simplest method, so that 
the reconstruction accuracy measures the sole influence of A 
and ft, and not their joint effect with reconstruction methods 
designed to improve the PSNR. This excludes anisotropic 
diffusion |[36) , weighting (371 . exploiting dictionaries l38l . etc. 

Reconstruction is thus performed by minimizing a least 
square error, equivalent to solving a Poisson Equation 139], 
14Q]. More precisely, considering all pairs of neighbor valid 
pixels a and b (i.e. both are within image boundaries and 
without missing data), we seek the reconstructed image J with 
pixel values vj G V such that: 


J = <vj : min ^ ( vj(b ) - vj(a) - g(a, b)f 


(7) 


a.b 


Here g(a , b ) = vi(b) — vi(a ) is the gradient of the original 
image I whenever pixel a is retained amongst the 20% with 
largest STD value, and null otherwise. This reconstruction 
is unique, optimal in the least square sense, and does not 
introduce any extra processing step on the image. It can be 
computed simply with sparse optimizers, such as (34l . We 
then add the mean of the original image that was lost in the 
process before comparing J and J, and clamp pixels within 
the normalized [0,1] interval. 

In order to assert the quality of the reconstruction, we use 
the Peak Signal to Noise Ratio (PSNR) criterion, as defined 
by: 


PSNR (I, J) = 10 log 10 


| C\/J2(Ma)~vj(a)) 2 


aec 


where C is the set of common valid pixels in I and J. 


B. Results 

In addition to the cameraman and sea surface temperature 
images, we also analyze the “lena”, “barbara”, “sea star” and 
“house” images, shown in Fig. [5] Their multiscale analysis 
maps are shown in Fig. [6] and the PSNR maxima are given 
in Table [T| Scales leading to better global reconstructions are 
clearly visible, forming distinct patterns for every picture. 

Our method provides a new multi-scale analysis of an 
image, in order to highlight the characteristic spatial A and data 


ft scale of its components. For example, we can see that the 
house picture has a maxima zone at A = 5 to 6 and high ft: this 
corresponds to the spatial extent and the contrast difference 
of the white borders of the roof, window frames and water 
drain. Other maxima are located at A of 1 and 1.5 pixels, with 
medium ft gray scale contrast: these are the brick textures. 
With this method, automatically identified parameters for the 
sea surface temperature data are A « 59km and ft « 0.9°C. 
These are consistent with our a priori estimate in Sec. |IV-C| 
further validating our approach. 


C. Selective texture erasing 

Similarly, the stripped patterns of the Barbara image are 
correctly matched on both scales. Fig. [7] shows in region A 
a texture of slanted stripes with a characteristic size A < 2.5 
and a low gray level constrast ft. These textures are correctly 
detected in the PSNR map of Fig. [6] as the leftmost maxima 
with lower A values. But the image also comprises other 
patterns of increasing contrast and size, as can be seen in 
regions B. Most of them have a characteristic size A = 3.5, at 
which the PSNR reaches a maxima for a range of large gray 
scales ft. 

Fig. [7] (middle) shows that textures disappear in the re¬ 
constructed image from the 20% largest STD pixels. Some 
patterns, like in region C, present textures with a larger spatial 
extent. As was the case for the artificial image in section [2] 
these patterns are not recognized with a smaller A = 3.5, and 
they appear intact in the reconstruction. Our method can thus 
also be used for texture erasing (42l . (43], preserving edges 
between different regions and around textured elements (D) 
but selectively removing all textures below a prescribed spatial 
scale A. 

An alternative method is to seek the image with gradients 
(Eq-0 that are as close as possible to the texture difference 
values (Eq. [3j. Since there cannot be a linear correspondance 
between pixel gradients in image space and “texture gradients” 
in RKHS, involving a non-linear reproducing kernel, it is 
expected that distortions are produced in the reconstructed 
image, assuming a meaningful image can be produced in 
the first place. Technical details for how to combine Eq. [3] 
into Eq. [7] are given in Appendix C, and involve orienting 
the “texture gradient” with the average intensity on each side 
of a target pixel. Despite being necessarily imperfect, Fig. [7] 
(right) shows that a reconstruction can still be obtained this 
way, with a PSNR of 22.95 and erased textures below A, but 
with some amount of blurring. The interesting point is that the 
method works as intended, proving that our “texture gradient” 
operation in RKHS, when properly oriented, really behaves 
like a gradient. 

VI. Color texture differences 

The path generation procedure described in Section 
makes no assumption on the nature of pixel values v 
just that sequences t G V m can be compared with a suitable 
reproducing kernel. The case for scalar pixels is presented 
above, but the method works equally well on vector data. In 
this case, V could be a color space, a set of spectral bands in 
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Figure 7. Erasing texture elements smaller than A = 3.5 pixels. Left: Annotated original Barbara image. Middle: Reconstruction from 20% selected points 
at A = 3.5 and k = 0.64, corresponding to the maximum of Table [I] Right: Reconstruction with 100% points but using the RKHS “texture gradient” instead 
of the image gradient. 



Figure 8. Texture difference analysis of a color image. Top-left: original image (full credit given in Appendix), rescaled to 600x450 pixels. Top-right: Scalar 
analysis using a gray version of the image. Bottom-right: Vector analysis in Lab space with the standard D65 white point. Bottom-left: Vector analysis using 
the CIE DE2000 perceptual color difference improvement A E. All these analyses use the spatial scale A = 1.5 pixels and the data scale k = 0.15. 
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remote sensing applications, polarized synthetic aperture radar 
values, or any other vector data. We deal here with the most 
common case of a triplet of RGB values. We first consider a 
kernel k : V x V 4 R that compares two RGB triplets and 
produce a similarity value, then take the product space kernel 
in V m for comparing sequences. 

Specifically, assuming V is the set of all v = (R , G, B ) 
triplets, we first consider the non-linear conversion operator 
£ : V —y C, to the CIE Lab space C using the standard D65 
white point. £(v) is thus a vector with entries (L, a, b). Then, 
we implement the following color difference operators: 

- Si ( v, w) = \\£ (v) — £ (w) || £ /100, based on the original 
intent of the Lab space to be perceptually uniform, so the 
norm in C should match a perceived color difference. This 
first formula is fast to compute. 

- S 2 (v , w) = A E (£ ( v ), £ ( w )) /100, based on the revised 
CIE DE 2000 standard lf44l for producing a better perceptually 
uniform difference A E between two Lab triplets. This second 
formula is slower to compute, but presumably more accurate. 

The kernel between two sequences G V m is then easily 
adapted from Eq. [6] with the same notations: 

kd(s,t ) = 1/ ^1 + /k) 2 ^ with d = 1,2 

(8) 

The data scale ft is now applied to the chosen Si or S 2 
operator, in order to highlight small or large color differences. 
Note that we normalize Si and S 2 to be within [0 ... 1] instead 
of the [0... 100] Lab space range, so that ft values for the 
color case are comparable to the scalar case presented in the 
previous sections: assuming the gray scale perfectly matches 
a perceptual color intensity, then hi would have the same 
meaning in the color and the gray scale analysi^] Using the 
same normalized hi, the scalar and vector analysis in Fig. [5] 
are thus expected to have a similar, but not exactly identical, 
general contrast range. On the other hand, the CIEDE2000 
A E formula was designed to correct slight discrepancies in the 
original Lab space perceptual uniformity, hence the contrast 
levels of the two vector analysis should be very similar, with 
only minor changes due to the A E correction. This is what 
we observed in Fig. [8] 

Fig. [8] shows how the vector analysis, using either the Lab 
space norm or A E for color matching, enhances the detected 
features compared to the scalar analysis on gray scale. In 
region A, the color difference in the top-left of the trompe- 
l’ceil painting is well recognized in the vector analysis, but the 
contrast is not large enough in gray scale. The real wall and 
windows on the same house (B) are also much better analyzed 
in color, especially with the orange/beige difference. Another 
prominent difference is the lamp post (C) in front of the tree, 
which is lost in foliage details in the gray scale version but 
clearly contoured in the color analysis. Unlike points A and 
B, this effect is not only due to color contrast, but it also 
involves the texture difference between the lamp post and 

6 In practice, this is not exactly the case, and gray scale conversion is an 
active topic of research in. Here we use the Y component of XYZ space, 
which is also used by The Gimp free software default’s gray conversion tool. 


the tree foliage. Texture differencing between the two trees 
is clearly visible in D, where the green tree is well contoured 
in the color analysis, while both trees have similar textures in 
gray scale. The small patchs of blue sky (E) better stand out 
against more uniform foliage textures in the color versions, 
compared to the gray case where they are buried in noise. 
Color panels are also better recognized in F. The rainbow 
is not detected (G) by any of the analyses. But its slowly 
varying colors over a large spatial extent do not match the 
scales A = 1.5 pixels and ft = 0.15 that are used here. 

The effect of using the CIEDE2000 correction is visible by 
close inspection of the bottom images in Fig. [8] Compared 
to the original Lab space, the non-linear A E correction gives 
slightly more contrast to the lower color differences, which 
was one of its purposes in the first place [ 44 ]. These effects are 
visible throughout the image, but maybe more clearly seen on 
the water (H) and stone wall (I) texture details. However, these 
effects are so small that, for most applications, computing A E 
may not be worth the added cost compared to working in the 
original Lab space. 

In any case, our method provides a new way to include 
color information in image and texture analysis, highlighed 
especially by points A, B and C above. More generally, the 
method is applicable whenever a reproducible kernel can be 
defined to compare pixel elements, whether these are scalar, 
vector, strings, graphs, etc. 

VII. Conclusion 

We introduce a new low-level image analysis method, based 
on statistical properties within the neighborhood of each pixel. 
We have demonstrated its use on synthetic and real images. 
In particular, our algorithm is able to retreive characteristic 
scales in remotely sensed data. The method we present is not 
limited to scalar values and it is readily applicable to any 
kind of data for which a reproducing kernel is available. This 
includes vector data, and in particular color triplets. Thanks 
to being specified at prescribed spatial and data scales, our 
method implements a new kind of filter, very different from 
wavelet-based analysis m or decomposition on dictionaries 
C3. For example, it can detect small discrepancies like 
jpeg quantization artifacts while being insensitive to large 
luminosity gradients, which could be useful for guided filtering 
of these artifacts. In the spatial domain, our method can 
detect repetitive texture patterns, as shown in Figs. [2] and 
[7J and produces a unique contour around these elements. 
This low-level algorithm represents textures in a statistical 
framework that is quite different from classic approaches like 
textons EH, and it also represents multi-scale information in 
a new framework. Our new method thus complements these 
techniques, and together with them has the potential to provide 
really new feature descriptors for images, with properties that 
need to be explored in future works. 

Appendix A: Producing texture discrepancy for 

MISSING DATA 

For directions tangent to an invalid data zone, half the 
neighborhood is valid and half is invalid, in both £+ and S ~(at 
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worst, since N±(o) is centered on the center of the valid pixel 
o, not on the edge). For such a S ± pair, and replacing the 
neighborhood by another worst-case approximation of a binary 
valid/invalid choice (i.e. neglecting m and the spatial extent A 
with many valid pixels in the neighborhood), the probability of 
never selecting a valid pair of pixels decreases exponentially as 
0(2 -n2 ) according to Eq. [ 3 J tending to null probability in the 
large n limit. For a typical computation with n = 100, this a 
gives ridiculously small p < lO -3000 probability of an invalid 
result. Thus, for all practical purposes, valid pixels near an 
invalid boundary “always” produce a valid computation result 
for typical and small values of n. 

In fact, even missing pixels within large missing data zones 
benefit from nearby valid pixels. Although we explicitly ignore 
these STD values due to their low accuracy, our method 
still allows getting relevant values for isolated missing pixels. 
Consider a neighborhood centered on that pixel. In that case, 
paths around that point may occasionally fall onto the missing 
pixel, but patterns are still matched outside that pixel. Since 
the modified kernel is normalized only on valid pixels, the 
missing data simply results in a reduction of the effective 
m = \C\, but valid patterns surrounding the missing pixel 
are still matched correctly in every direction. The price to pay 
is a loss of reliability, especially as the nearest distance to a 
valid pixel grows. We do not exploit this feature in the data 
presented in the main text, where no STD value is produced 
for missing pixels, but this feature could be useful in other 
contexts, for example to simply ignore isolated missing pixels. 


gradients: min]T a 6 (vj(b) - vj{a) =b d (S~ (a), S+ (a))) 2 + 
(vj(b) — vj(a) ±d(S~(b),S+(b))) 2 . The signs are de¬ 
termined using the average value of the original im¬ 
age pixels in each neighborhood: sign d (S~ (a), 5 + (a)) = 

sign (E xeN+(a)P( X ) v d x ) - ExeN-(a)P( x M x ))’ and 
similarly for b. With this method, the result for A = 3.5 and 
k = 0.64 is shown in Fig. [7] right. 

Appendix D: Data and code availability 

The code used in this paper is free software, available 
for download at https://gforge.inria.fr/scm/?group_id=5803 
Images for Lena, Barbara, house, were fetched from 
J. Portilla’s web page and match these used in ED. 
We used the cameraman picture from McGuire Graphics 
Data at http://graphics.cs.williams.edu/data/images.xml The 
sea star is the training sample 12003 from the Berkeley 
segmentation S300 data set 0 and was downloaded 
from http://www.eecs.berkeley.edu/Research/Projects/CS/ 
vision/bsds/B SDS300/html/dataset/images/gray/12003 .html 
The sea surface temperature data was provided by 
the Legos laboratory http://www.legos.obs-mip.fr/ 
The color image in Fig. [8] comes from Wikimedia 
Commons, at http://commons.wikimedia.org/wiki/File: 
2013-10- 19_10-53-16-savoureuse-belfort.jpg This long 
exposure photograph was taken by Thomas Bresson with 
a polarizing filter, and released under Creative Commons - 
Attribution - 3.0. The original image was cropped and resized 
to 600 x 450 pixels. 


Appendix B: Combining scales 


In Section |IlLCl k sets the scale at which scalar pixel 
values are compared and should reflect a priori information 
on the image. In order to compare probability distributions, 
123 proposes to either take a Bayesian approach for letting 
k vary with a priori knowledge, or to take d(S~,S+) = 
sup^. d K (S~,S+) over a range of k for a more systematic 
approach. We found that the latter does not perform well in 
practice. For example, a picture captured by a digital camera 
sensor in low light conditions presents some digital noise on 
the pixel values. A similar effect occurs for quantization noise 
and jpeg artifacts (see Section [TV-B| ). Comparing distributions 
in V m at the scale of this noise is meaningless, as it would 
reflect differences due to the sensor (resp. quantization algo¬ 
rithm) itself, but not the differences in the image textures. 
Given the sensor fluctuations, the distance d K (S~, S+) at 
low k, values is larger that at the natural scale of the image 
signal, hence the sup^ approach does not work in this case. 
Physically, and for remotely sensed data in particular, the sup^. 
approach amounts to mixing together processes at different 
characteristic scales, which may have nothing to do with each 
other. 


Appendix C: Reconstruction from “texture 

GRADIENTS” 

For a pair of valid neighbor pixels (a, 6), consider the 
neighborhood 7V + (a) that includes b and N~(b) that in¬ 
cludes a. We add constraints in Eq. [7] using Eq. [3] for the 
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